nlp_architect.data.intent_datasets.TabularIntentDataset

class nlp_architect.data.intent_datasets.TabularIntentDataset(train_file, test_file, sentence_length=30, word_length=12)[source]

Tabular Intent/Slot tags dataset loader. Compatible with many sequence tagging datasets (ATIS, CoNLL, etc..) data format must be int tabular format where: - one word per line with tag annotation and intent type separated by tabs <token> <tag_label> <intent>

  • sentences are separated by an empty line
Parameters:
  • train_file (str) – path to train set file
  • test_file (str) – path to test set file
  • sentence_length (int) – max sentence length
  • word_length (int) – max word length
__init__(train_file, test_file, sentence_length=30, word_length=12)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(train_file, test_file[, …]) Initialize self.

Attributes

char_vocab word character vocabulary
char_vocab_size char vocabulary size
files
intent_size intent label vocabulary size
intents_vocab intent labels vocabulary
label_vocab_size label vocabulary size
tags_vocab labels vocabulary
test_set test set
train_set train set
word_vocab tokens vocabulary
word_vocab_size vocabulary size
char_vocab

word character vocabulary

Type:dict
char_vocab_size

char vocabulary size

Type:int
files = ['train', 'test']
intent_size

intent label vocabulary size

Type:int
intents_vocab

intent labels vocabulary

Type:dict
label_vocab_size

label vocabulary size

Type:int
tags_vocab

labels vocabulary

Type:dict
test_set

test set

Type:tuple of numpy.ndarray
train_set

train set

Type:tuple of numpy.ndarray
word_vocab

tokens vocabulary

Type:dict
word_vocab_size

vocabulary size

Type:int